Skip List
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a skip list (or skiplist) is a probabilistic
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
that allows \mathcal(\log n) average complexity for search as well as \mathcal(\log n) average complexity for insertion within an ordered sequence of n elements. Thus it can get the best features of a sorted
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
(for searching) while maintaining a linked list-like structure that allows insertion, which is not possible with a static array. Fast search is made possible by maintaining a linked hierarchy of subsequences, with each successive subsequence skipping over fewer elements than the previous one (see the picture below on the right). Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than or equal to the element searched for. Via the linked hierarchy, these two elements link to elements of the next sparsest subsequence, where searching is continued until finally searching in the full sequence. The elements that are skipped over may be chosen probabilistically or deterministically, with the former being more common.


Description

A skip list is built in layers. The bottom layer 1 is an ordinary ordered linked list. Each higher layer acts as an "express lane" for the lists below, where an element in layer i appears in layer i+1 with some fixed probability p (two commonly used values for p are 1/2 or 1/4). On average, each element appears in 1/(1-p) lists, and the tallest element (usually a special head element at the front of the skip list) appears in all the lists. The skip list contains \log_n\, (i.e. logarithm base 1/p of n) lists. A search for a target element begins at the head element in the top list, and proceeds horizontally until the current element is greater than or equal to the target. If the current element is equal to the target, it has been found. If the current element is greater than the target, or the search reaches the end of the linked list, the procedure is repeated after returning to the previous element and dropping down vertically to the next lower list. The expected number of steps in each linked list is at most 1/p, which can be seen by tracing the search path backwards from the target until reaching an element that appears in the next higher list or reaching the beginning of the current list. Therefore, the total ''expected'' cost of a search is \tfrac\log_n which is \mathcal(\log n)\,, when p is a constant. By choosing different values of p, it is possible to trade search costs against storage costs.


Implementation details

The elements used for a skip list can contain more than one pointer since they can participate in more than one list. Insertions and deletions are implemented much like the corresponding linked-list operations, except that "tall" elements must be inserted into or deleted from more than one linked list. \mathcal(n) operations, which force us to visit every node in ascending order (such as printing the entire list), provide the opportunity to perform a behind-the-scenes derandomization of the level structure of the skip-list in an optimal way, bringing the skip list to \mathcal(\log n) search time. (Choose the level of the i'th finite node to be 1 plus the number of times it is possible to repeatedly divide i by 2 before it becomes odd. Also, i=0 for the negative infinity header as there is the usual special case of choosing the highest possible level for negative and/or positive infinite nodes.) However this also allows someone to know where all of the higher-than-level 1 nodes are and delete them. Alternatively, the level structure could be made quasi-random in the following way: make all nodes level 1 j ← 1 while the number of nodes at level j > 1 do for each i'th node at level j do if i is odd and i is not the last node at level j randomly choose whether to promote it to level j+1 else if i is even and node i-1 was not promoted promote it to level j+1 end if repeat j ← j + 1 repeat Like the derandomized version, quasi-randomization is only done when there is some other reason to be running an \mathcal(n) operation (which visits every node). The advantage of this quasi-randomness is that it doesn't give away nearly as much level-structure related information to an adversarial user as the de-randomized one. This is desirable because an adversarial user who is able to tell which nodes are not at the lowest level can pessimize performance by simply deleting higher-level nodes. (Bethea and Reiter however argue that nonetheless an adversary can use probabilistic and timing methods to force performance degradation.) The search performance is still guaranteed to be logarithmic. It would be tempting to make the following "optimization": In the part which says "Next, for each ''i''th...", forget about doing a coin-flip for each even-odd pair. Just flip a coin once to decide whether to promote only the even ones or only the odd ones. Instead of \mathcal(n \log n) coin flips, there would only be \mathcal(\log n) of them. Unfortunately, this gives the adversarial user a 50/50 chance of being correct upon guessing that all of the even numbered nodes (among the ones at level 1 or higher) are higher than level one. This is despite the property that there is a very low probability of guessing that a particular node is at level ''N'' for some integer ''N''. A skip list does not provide the same absolute worst-case performance guarantees as more traditional
balanced tree In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donald ...
data structures, because it is always possible (though with very low probability) that the coin-flips used to build the skip list will produce a badly balanced structure. However, they work well in practice, and the randomized balancing scheme has been argued to be easier to implement than the deterministic balancing schemes used in balanced binary search trees. Skip lists are also useful in
parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
, where insertions can be done in different parts of the skip list in parallel without any global rebalancing of the data structure. Such parallelism can be especially advantageous for resource discovery in an ad-hoc
wireless network A wireless network is a computer network that uses wireless data connections between network nodes. Wireless networking is a method by which homes, telecommunications networks and business installations avoid the costly process of introducing ...
because a randomized skip list can be made robust to the loss of any single node.


Indexable skiplist

As described above, a skip list is capable of fast \mathcal(\log n) insertion and removal of values from a sorted sequence, but it has only slow \mathcal(n) lookups of values at a given position in the sequence (i.e. return the 500th value); however, with a minor modification the speed of random access indexed lookups can be improved to \mathcal(\log n). For every link, also store the width of the link. The width is defined as the number of bottom layer links being traversed by each of the higher layer "express lane" links. For example, here are the widths of the links in the example at the top of the page: 1 10 o---> o---------------------------------------------------------> o Top level 1 3 2 5 o---> o---------------> o---------> o---------------------------> o Level 3 1 2 1 2 3 2 o---> o---------> o---> o---------> o---------------> o---------> o Level 2 1 1 1 1 1 1 1 1 1 1 1 o---> o---> o---> o---> o---> o---> o---> o---> o---> o---> o---> o Bottom level Head 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th NIL Node Node Node Node Node Node Node Node Node Node Notice that the width of a higher level link is the sum of the component links below it (i.e. the width 10 link spans the links of widths 3, 2 and 5 immediately below it). Consequently, the sum of all widths is the same on every level (10 + 1 = 1 + 3 + 2 + 5 = 1 + 2 + 1 + 2 + 3 + 2). To index the skip list and find the i'th value, traverse the skip list while counting down the widths of each traversed link. Descend a level whenever the upcoming width would be too large. For example, to find the node in the fifth position (Node 5), traverse a link of width 1 at the top level. Now four more steps are needed but the next width on this level is ten which is too large, so drop one level. Traverse one link of width 3. Since another step of width 2 would be too far, drop down to the bottom level. Now traverse the final link of width 1 to reach the target running total of 5 (1+3+1). function lookupByPositionIndex(i) node ← head i ← i + 1 ''# don't count the head as a step'' for level from top to bottom do while i ≥ node.width eveldo ''# if next step is not too far'' i ← i - node.width evel ''# subtract the current width'' node ← node.next evel ''# traverse forward at the current level'' repeat repeat return node.value end function This method of implementing indexing is detailed in ''"A skip list cookbook"'' by William Pugh


History

Skip lists were first described in 1989 by William Pugh. To quote the author:


Usages

List of applications and frameworks that use skip lists: *
Apache Portable Runtime The Apache Portable Runtime (APR) is a supporting library for the Apache web server. It provides a set of APIs that map to the underlying operating system (OS). Where the OS does not support a particular function, APR will provide an emulation. ...
implements skip lists. *
MemSQL SingleStore (formerly MemSQL) is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ...
uses lock-free skip lists as its prime indexing structure for its database technology. *
MuQSS The Brain Fuck Scheduler (BFS) is a process scheduler designed for the Linux kernel in August 2009 as an alternative to the Completely Fair Scheduler (CFS) and the O(1) scheduler. BFS was created by an experienced kernel programmer Con Koli ...
built on skip lists as a cpu scheduler. *
Cyrus IMAP server The Cyrus IMAP server is electronic mail server software developed by Carnegie Mellon University. It differs from other Internet Message Access Protocol (IMAP) server implementations in that it is generally intended to be run on sealed servers, ...
offers a "skiplist" backend DB implementation *
Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as ...
uses skip lists to search delta-encoded posting lists in logarithmic time. * The "QMap" key/value dictionary (up to Qt 4) template class of Qt is implemented with skip lists. * Redis, an ANSI-C open-source persistent key/value store for Posix systems, uses skip lists in its implementation of ordered sets. *"nessDB", a very fast key-value embedded Database Storage Engine (Using log-structured-merge (LSM) trees), uses skip lists for its memtable. * "skipdb" is an open-source implementation of an ACID ordered key/value database, implemented with a skip list instead of a b-tree. * ConcurrentSkipListSet and ConcurrentSkipListMap in the Java 1.6 API. Used by
Apache HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sys ...
.
Speed Tables
are a fast key-value datastore for Tcl that use skiplists for indexes and lockless shared memory. * "leveldb" is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values *Con Kolivas' MuQSS Scheduler for the Linux kernel uses skip lists
SkiMap
uses skip lists as base data structure to build a more complex 3D Sparse Grid for Robot Mapping systems. * "IOWOW" is a persistent
C11 C11, C.XI, C-11 or C.11 may refer to: Transport * C-11 Fleetster, a 1920s American light transport aircraft for use of the United States Assistant Secretary of War * Fokker C.XI, a 1935 Dutch reconnaissance seaplane * LET C-11, a license-build var ...
key/value storage library that uses skip lists as its main data structure. Skip lists are used for efficient statistical computations of running medians (also known as moving medians). Skip lists are also used in distributed applications (where the nodes represent physical computers, and pointers represent network connections) and for implementing highly scalable concurrent priority queues with less lock contention, or even without locking, as well as
lock-free In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking im ...
concurrent dictionaries. There are also several US patents for using skip lists to implement (lockless) priority queues and concurrent dictionaries.


See also

* Bloom filter *
Skip graph Skip graphs are a kind of distributed data structure based on skip lists. They were invented in 2003 by James Aspnes and Gauri Shah. A nearly identical data structure called SkipNet was independently invented by Nicholas Harvey, Michael Jones, Ste ...


References


External links


"Skip list" entry
in the
Dictionary of Algorithms and Data Structures The NIST ''Dictionary of Algorithms and Data Structures'' is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data structures. For algorithms and ...

Skip Lists: A Linked List with Self-Balancing BST-Like Properties
on
MSDN Microsoft Developer Network (MSDN) was the division of Microsoft responsible for managing the firm's relationship with developers and testers, such as hardware developers interested in the operating system (OS), and software developers developing ...
in C# 2.0
Why Skip Lists?

Skip Lists lecture (MIT OpenCourseWare: Introduction to Algorithms)


Pat Morin Patrick Ryan Morin is a Canadian computer scientist specializing in computational geometry and data structures. He is a professor in the School of Computer Science at Carleton University. Education and career Morin was educated at Carleton Univers ...

Skip trees, an alternative data structure to skip lists in a concurrent approach

Skip tree graphs, a distributed version of skip trees

More on skip tree graphs, a distributed version of skip trees
{{Data structures Computer-related introductions in 1989 Linked lists Probabilistic data structures de:Liste (Datenstruktur)#Skip-Liste